GitHub

您所在的位置:网站首页 best paper finalist GitHub

GitHub

2024-07-11 13:03| 来源: 网络整理| 查看: 265

DeepViewAgg [CVPR'22 Best Paper Finalist 馃帀]

arXiv | Paper | Supplementary | Project Page | Video | Poster | CV News

PWC PWC

Official repository for Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation paper 馃搫 selected for an Oral presentation at CVPR 2022.

We propose to exploit the synergy between images and 3D point clouds by learning to select the most relevant views for each point. Our approach uses the viewing conditions of 3D points to merge features from images taken at arbitrary positions. We reach SOTA results for S3DIS (74.7 mIoU 6-Fold) and on KITTI- 360 (58.3 mIoU) without requiring point colorization, meshing, or the use of depth cameras: our full pipeline only requires raw 3D scans and a set of images and poses.

Coming soon 馃毃 馃毀 Change log 2023-01-11 Fixed some bug when using intermediate fusion 2022-04-20 Added notebooks and scripts to get started with DeepViewAgg 2022-04-27 Added pretrained weights and features to help reproduce our results Requirements 馃摑

The following must be installed before installing this project.

Anaconda3 cuda >= 10.1 gcc >= 7

All remaining dependencies (PyTorch, PyTorch Geometric, etc) should be installed using the provided installation script.

The code has been tested in the following environment:

Ubuntu 18.04.6 LTS Python 3.8.5 PyTorch 1.7.1 CUDA 10.2, 11.2 and 11.4 NVIDIA V100 32G 64G RAM Installation 馃П

To install DeepViewAgg, simply run ./install.sh from inside the repository.

You will need to have sudo rights to install MinkowskiEngine and TorchSparse dependencies. 鈿狅笍 Do not install Torch-Points3D from the official repository, or with pip. Disclaimer

This is not the official Torch-Points3D framework. This work builds on and modifies a fixed version of the framework and has not been merged with the official repository yet. In particular, this repository introduces numerous features for multimodal learning on large-scale 3D point clouds. In this repository, some TP3D-specific files were removed for simplicity.

Project structure

The project follows the original Torch-Points3D framework structure.

鈹溾攢 conf # All configurations live there 鈹溾攢 notebooks # Notebooks to get started with multimodal datasets and models 鈹溾攢 eval.py # Eval script 鈹溾攢 insall.sh # Installation script for DeepViewAgg 鈹溾攢 scripts # Some scripts to help manage the project 鈹溾攢 torch_points3d 鈹溾攢 core # Core components 鈹溾攢 datasets # All code related to datasets 鈹溾攢 metrics # All metrics and trackers 鈹溾攢 models # All models 鈹溾攢 modules # Basic modules that can be used in a modular way 鈹溾攢 utils # Various utils 鈹斺攢 visualization # Visualization 鈹斺攢 train.py # Main script to launch a training

Several changes were made to extend the original project to multimodal learning on point clouds with images. The most important ones can be found in the following:

conf/data/segmentation/multimodal: configs for the 3D+2D datasets. conf/models/segmentation/multimodal: configs for the 3D+2D models. torch_points3d/core/data_transform/multimodal: transforms for 3D+2D data. torch_points3d/core/multimodal: multimodal data and mapping objects. torch_points3d/datasets/segmentation/multimodal: 3D+2D datasets (eg S3DIS, ScanNet, KITTI360). torch_points3d/models/segmentation/multimodal: 3D+2D architectures. torch_points3d/modules/multimodal: 3D+2D modules. This is where the DeepViewAgg module can be found. torch_points3d/visualization/multimodal_data.py: tools for interactive visualization of multimodal data. Getting started 馃殌

Notebook to create synthetic toy dataset and get familiar with 2D-3D mappings construction :

notebooks/synthetic_multimodal_dataset.ipynb

Notebooks to create dataset, get familiar with dataset configuration and produce interactive visualization. You can also run inference from a checkpoint and visualize predictions:

notebooks/kitti360_visualization.ipynb (at least 350G of memory 馃捑) notebooks/s3dis_visualization.ipynb (at least 400G of memory 馃捑) notebooks/scannet_visualization.ipynb (at least 1.3T of memory 馃捑)

Notebooks to create multimodal models, get familiar with model configuration and run forward and backward passes for debugging:

notebooks/multimodal_model.ipynb

Notebooks to run full inference on multimodal datasets, from a model checkpoint. Those should allow you to reproduce our results by using the pretrained models in Models:

notebooks/kitti360_inference.ipynb notebooks/s3dis_inference.ipynb notebooks/scannet_inference.ipynb

Scripts to replicate our paper's best experiments 馃搱 for each dataset:

scripts/train_kitti360.sh scripts/train_s3dis.sh scripts/train_scannet.sh

If you need to go deeper into this project, see the Documentation section.

If you have trouble using these or need reproduce other results from our paper, create an issue or leave me a message 馃挰 !

Models Model name Dataset mIoU 馃捑 馃憞 Res16UNet34-L4-early S3DIS 6-Fold 74.7 2.0G link Res16UNet34-PointPyramid-early-cityscapes-interpolate KITTI-360 61.7 Val / 58.3 Test 339M link Res16UNet34-L4-early ScanNet 71.0 Val 341M link Documentation 馃摎

The official documentation of Pytorch Geometric and Torch-Points3D are good starting points, since this project largely builds on top of these frameworks. For DeepViewAgg-specific features (i.e. all that concerns multimodal learning), the provided code is commented as much as possible, but hit me up 馃挰 if some parts need clarification.

Visualization of multimodal data 馃敪

We provide code to produce interactive and sharable HTML visualizations of multimodal data and point-image mappings:

Examples of such HTML produced on S3DIS Fold 5 are zipped here and can be opened in your browser.

Known issues Setting use_faiss=True or use_cuda=True to accelerate PCAComputePointwise, MapImages or NeighborhoodBasedMappingFeatures. As suggested here, one should stick to the CPU-based computation for now. Credits 馃挸 This implementation of DeepViewAgg largely relies on the Torch-Points3D framework, although not merged with the official project at this point. For datasets, some code from the official KITTI-360 and ScanNet repositories was used. Reference

In case you use all or part of the present code, please include a citation to the following paper:

@inproceedings{robert2022dva, title={Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation}, author={Robert, Damien and Vallet, Bruno and Landrieu, Loic}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2022}, pages={5575--5584}, year={2022}, url = {\url{https://github.com/drprojects/DeepViewAgg}} }


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3